Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering

نویسندگان

  • Athraa Jasim Mohammed
  • Yuhanis Yusof
  • Husniza Husni
چکیده

Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, never‐ theless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFAmerge, clustering algorithm automati‐ cally groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFAmerge were conducted on two datasets; 20Newsgroups and Reuter’s news collection. Results indicate that the aFAmerge generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection

Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...

متن کامل

Document Clustering Based on Firefly Algorithm

Corresponding Author: Athraa Jasim Mohammed School of Computing, Universiti Utara Malaysia, Kedah, Malaysia Email: [email protected] Abstract: Document clustering is widely used in Information Retrieval however, existing clustering techniques suffer from local optima problem in determining the k number of clusters. Various efforts have been put to address such drawback and this includes...

متن کامل

An Expansion of -means for Automatically Determining the Optimal Number of Clusters

We expand a non-hierarchical clustering algorithm that can determine the optimal number of clusters by using iterations of -means and a stopping rule based on Bayesian Information Criterion (BIC). The procedure requires merging the clusters that a -means iteration has made to avoid unsuitable division caused by the division order. By using this additional merging operation, the case of adequate...

متن کامل

خوشه‌بندی خودکار داده‌ها با بهره‌گیری از الگوریتم رقابت استعماری بهبودیافته

Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015